output code
Contrastive ECOC: Learning Output Codes for Adversarial Defense
Chou, Che-Yu, Chen, Hung-Hsuan
Although one-hot encoding is commonly used for multiclass classification, it is not always the most effective encoding mechanism. Error Correcting Output Codes (ECOC) address multiclass classification by mapping each class to a unique codeword used as a label. Traditional ECOC methods rely on manually designed or randomly generated codebooks, which are labor-intensive and may yield suboptimal, dataset-agnostic results. This paper introduces three models for automated codebook learning based on contrastive learning, allowing codebooks to be learned directly and adaptively from data. Across four datasets, our proposed models demonstrate superior robustness to adversarial attacks compared to two baselines. The source is available at https://github.com/YuChou20/Automated-Codebook-Learning-with-Error-Correcting-Output-Code-Technique.
Reviews: Error Correcting Output Codes Improve Probability Estimation and Adversarial Robustness of Deep Neural Networks
Summary: The region of uncertainty (prediction probability close to 0.5) for softmax of logits is extremely small near an M-1 dimensional hyperplane in the logits space. The reason is changing one of the logits for one of the classes affects the probability vectors in all dimensions. The authors show that, if each logit is first converted to an independent probability using 1/(1 exp(-x)) function and the probability vector correlated with each codeword of an error correcting in a soft way to decode, this method has a large volume of uncertainty. The volume of uncertainty is larger when the min hamming distance of the code is large. This because multiple logits must be changed at the same time to cause a wrong decoding.
Generalization emerges from local optimization in a self-organized learning network
We design and analyze a new paradigm for building supervised learning networks, driven only by local optimization rules without relying on a global error function. Traditional neural networks with a fixed topology are made up of identical nodes and derive their expressiveness from an appropriate adjustment of connection weights. In contrast, our network stores new knowledge in the nodes accurately and instantaneously, in the form of a lookup table. Only then is some of this information structured and incorporated into the network geometry. The training error is initially zero by construction and remains so throughout the network topology transformation phase. The latter involves a small number of local topological transformations, such as splitting or merging of nodes and adding binary connections between them. The choice of operations to be carried out is only driven by optimization of expressivity at the local scale. What we are primarily looking for in a learning network is its ability to generalize, i.e. its capacity to correctly answer questions for which it has never learned the answers. We show on numerous examples of classification tasks that the networks generated by our algorithm systematically reach such a state of perfect generalization when the number of learned examples becomes sufficiently large. We report on the dynamics of the change of state and show that it is abrupt and has the distinctive characteristics of a first order phase transition, a phenomenon already observed for traditional learning networks and known as grokking. In addition to proposing a non-potential approach for the construction of learning networks, our algorithm makes it possible to rethink the grokking transition in a new light, under which acquisition of training data and topological structuring of data are completely decoupled phenomena.